bandit game
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Government (1.00)
- Leisure & Entertainment > Games (0.68)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- Asia > Middle East > Jordan (0.04)
- Government (1.00)
- Leisure & Entertainment > Games (0.69)
Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games
Real world applications such as economics and policy making often involve solving multi-agent games with two unique features: (1) The agents are inherently and partitioned into leaders and followers; (2) The agents have different reward functions, thus the game is . The majority of existing results in this field focuses on either symmetric solution concepts (e.g.
Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games
Real world applications such as economics and policy making often involve solving multi-agent games with two unique features: (1) The agents are inherently asymmetric and partitioned into leaders and followers; (2) The agents have different reward functions, thus the game is general-sum . The majority of existing results in this field focuses on either symmetric solution concepts (e.g.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Government (1.00)
- Leisure & Entertainment > Games (0.68)
Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games
Real world applications such as economics and policy making often involve solving multi-agent games with two unique features: (1) The agents are inherently asymmetric and partitioned into leaders and followers; (2) The agents have different reward functions, thus the game is general-sum . The majority of existing results in this field focuses on either symmetric solution concepts (e.g.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- Asia > Middle East > Jordan (0.04)
- Government (1.00)
- Leisure & Entertainment > Games (0.69)
Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games
Real world applications such as economics and policy making often involve solving multi-agent games with two unique features: (1) The agents are inherently asymmetric and partitioned into leaders and followers; (2) The agents have different reward functions, thus the game is general-sum. The majority of existing results in this field focuses on either symmetric solution concepts (e.g. It remains open how to learn the Stackelberg equilibrium---an asymmetric analog of the Nash equilibrium---in general-sum games efficiently from noisy samples. This paper initiates the theoretical study of sample-efficient learning of the Stackelberg equilibrium, in the bandit feedback setting where we only observe noisy samples of the reward. We consider three representative two-player general-sum games: bandit games, bandit-reinforcement learning (bandit-RL) games, and linear bandit games.
Incentivized Learning in Principal-Agent Bandit Games
Scheid, Antoine, Tiapkin, Daniil, Boursier, Etienne, Capitaine, Aymeric, Mhamdi, El Mahdi El, Moulines, Eric, Jordan, Michael I., Durmus, Alain
Real-world decision-making problems, however, often present challenges that are not addressed in this simple This work considers a repeated principal-agent optimization framework. These include the challenge of bandit game, where the principal can only scarcity when there are multiple decision-makers, issues interact with her environment through the agent. of misaligned objectives, and problems arising from The principal and the agent have misaligned information asymmetries and signaling. The economics objectives and the choice of action is only left to literature addresses these issues through the design of the agent. However, the principal can influence game-theoretic mechanisms, including auctions and the agent's decisions by offering incentives which contracts (see, e.g., Myerson, 1989; Laffont & Martimort, add up to his rewards. The principal aims to 2009), aiming to achieve favorable outcomes despite agents' iteratively learn an incentive policy to maximize self-interest and limited information set.
- Europe > Kosovo > District of Gjilan > Kamenica (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
Optimistic Thompson Sampling for No-Regret Learning in Unknown Games
Li, Yingru, Liu, Liangqi, Pu, Wenqiang, Liang, Hao, Luo, Zhi-Quan
This work tackles the complexities of multi-player scenarios in \emph{unknown games}, where the primary challenge lies in navigating the uncertainty of the environment through bandit feedback alongside strategic decision-making. We introduce Thompson Sampling (TS)-based algorithms that exploit the information of opponents' actions and reward structures, leading to a substantial reduction in experimental budgets -- achieving over tenfold improvements compared to conventional approaches. Notably, our algorithms demonstrate that, given specific reward structures, the regret bound depends logarithmically on the total action space, significantly alleviating the curse of multi-player. Furthermore, we unveil the \emph{Optimism-then-NoRegret} (OTN) framework, a pioneering methodology that seamlessly incorporates our advancements with established algorithms, showcasing its utility in practical scenarios such as traffic routing and radar sensing in the real world.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
When is Offline Two-Player Zero-Sum Markov Game Solvable?
We study what dataset assumption permits solving offline two-player zero-sum Markov game. In stark contrast to the offline single-agent Markov decision process, we show that the single strategy concentration assumption is insufficient for learning the Nash equilibrium (NE) strategy in offline two-player zero-sum Markov games. On the other hand, we propose a new assumption named unilateral concentration and design a pessimism-type algorithm that is provably efficient under this assumption. In addition, we show that the unilateral concentration assumption is necessary for learning an NE strategy. Furthermore, our algorithm can achieve minimax sample complexity without any modification for two widely studied settings: dataset with uniform concentration assumption and turn-based Markov game. Our work serves as an important initial step towards understanding offline multi-agent reinforcement learning.
- North America > United States (0.14)
- Europe > United Kingdom > England > Greater London > London (0.14)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)